309 research outputs found
VEWS: A Wikipedia Vandal Early Warning System
We study the problem of detecting vandals on Wikipedia before any human or
known vandalism detection system reports flagging potential vandals so that
such users can be presented early to Wikipedia administrators. We leverage
multiple classical ML approaches, but develop 3 novel sets of features. Our
Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing
patterns as features to classify some users as vandals. Our Wikipedia
Transition Probability Matrix (WTPM) approach uses a set of features derived
from a transition probability matrix and then reduces it via a neural net
auto-encoder to classify some users as vandals. The VEWS approach merges the
previous two approaches. Without using any information (e.g. reverts) provided
by other users, these algorithms each have over 85% classification accuracy.
Moreover, when temporal recency is considered, accuracy goes to almost 90%. We
carry out detailed experiments on a new data set we have created consisting of
about 33K Wikipedia users (including both a black list and a white list of
editors) and containing 770K edits. We describe specific behaviors that
distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG
and STiki, the best known algorithms today for vandalism detection. Moreover,
VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39
edits before ClueBot NG when both detect the vandal. However, we show that the
combination of VEWS and ClueBot NG can give a fully automated vandal early
warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of
Knowledge Discovery and Data Mining (KDD 2015
Hybrid knowledge systems
An architecture called hybrid knowledge system (HKS) is described that can be used to interoperate between a specification of the control laws describing a physical system, a collection of databases, knowledge bases and/or other data structures reflecting information about the world in which the physical system controlled resides, observations (e.g. sensor information) from the external world, and actions that must be taken in response to external observations
Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors
Although machine learning based algorithms have been extensively used for
detecting phishing websites, there has been relatively little work on how
adversaries may attack such "phishing detectors" (PDs for short). In this
paper, we propose a set of Gray-Box attacks on PDs that an adversary may use
which vary depending on the knowledge that he has about the PD. We show that
these attacks severely degrade the effectiveness of several existing PDs. We
then propose the concept of operation chains that iteratively map an original
set of features to a new set of features and develop the "Protective Operation
Chain" (POC for short) algorithm. POC leverages the combination of random
feature selection and feature mappings in order to increase the attacker's
uncertainty about the target PD. Using 3 existing publicly available datasets
plus a fourth that we have created and will release upon the publication of
this paper, we show that POC is more robust to these attacks than past
competing work, while preserving predictive performance when no adversarial
attacks are present. Moreover, POC is robust to attacks on 13 different
classifiers, not just one. These results are shown to be statistically
significant at the p < 0.001 level
Strong Completeness Results for Paraconsistent Logic Programming
In [6], we introduced a means of allowing logic programs to contain negations in both the head and the body of a clause. Such programs were called generally Horn programs (GHPs, for short). The model-theoretic semantics of GHPs were defined in terms of four-valued Belnap lattices [5]. For a class of programs called well-behaved programs, an SLD-resolution like proof procedure was introduced. This procedure was proven (under certain restrictions) to be sound (for existential queries) and complete (for ground queries). In this paper, we remove the restriction that programs be well-behaved and extend our soundness and completeness results to apply to arbitrary existential queries and to arbitrary GHPs. This is the strongest possible completeness result for GHPs. The results reported here apply to the design of very large knowledge bases and in processing queries to knowledge bases that possibly contain erroneous information
Abduction in Annotated Probabilistic Temporal Logic
Annotated Probabilistic Temporal (APT) logic programs are a form of logic programs that allow users to state (or systems to automatically learn)rules of the form ``formula G becomes true K time units after formula F became true with L to U% probability.\u27\u27
In this paper, we develop a theory of abduction for APT logic programs. Specifically, given an APT logic program Pi, a set of formulas H that can be ``added\u27\u27 to Pi, and a goal G, is there a subset S of H such that Pi cup S is consistent and entails the goal G? In this paper, we study the complexity of the Basic APT Abduction Problem (BAAP). We then leverage a geometric characterization of BAAP to suggest a set of pruning strategies when solving BAAP and use these intuitions to develop a sound and complete algorithm
Deception Detection in Group Video Conversations using Dynamic Interaction Networks
Detecting groups of people who are jointly deceptive in video conversations
is crucial in settings such as meetings, sales pitches, and negotiations. Past
work on deception in videos focuses on detecting a single deceiver and uses
facial or visual features only. In this paper, we propose the concept of
Face-to-Face Dynamic Interaction Networks (FFDINs) to model the interpersonal
interactions within a group of people. The use of FFDINs enables us to leverage
network relations in detecting group deception in video conversations for the
first time. We use a dataset of 185 videos from a deception-based game called
Resistance. We first characterize the behavior of individual, pairs, and groups
of deceptive participants and compare them to non-deceptive participants. Our
analysis reveals that pairs of deceivers tend to avoid mutual interaction and
focus their attention on non-deceivers. In contrast, non-deceivers interact
with everyone equally. We propose Negative Dynamic Interaction Networks to
capture the notion of missing interactions. We create the DeceptionRank
algorithm to detect deceivers from NDINs extracted from videos that are just
one minute long. We show that our method outperforms recent state-of-the-art
computer vision, graph embedding, and ensemble methods by at least 20.9% AUROC
in identifying deception from videos.Comment: The paper is published at ICWSM 2021. Dataset link:
https://snap.stanford.edu/data/comm-f2f-Resistance.htm
- …